Friedrich - Alexander - Universität Erlangen - Nürnberg Institut

نویسنده

  • Michael Piotrowski
چکیده

The amount of information available in electronic form is growing exponentially , making it increasingly difficult to find the desired information. This is especially true of the World Wide Web, which has no central administration and thus no ordering scheme to help users find the information they need. Furthermore , most of the information is narrative, i.e., in the form of unstructured documents written in natural languages, as opposed to structured information stored in databases. Information retrieval is primarily concerned with the storage and retrieval of unstructured information. Thus, along with the growth of the World Wide Web, information retrieval systems gain importance since they are often the only way to find the few documents actually relevant to a specific question in the vast quantities of text available. Internet search engines like AltaVista or Lycos are very popular and commercially successful. Although information retrieval systems mainly deal with natural language, linguistic methods are rarely used. Most systems only use stemming, i.e., the mechanical cutting off of inflectional and derivational suffixes to better match index terms to query terms. Since most research on information retrieval is done for English, which has a relatively weak morphology, this is seldom regarded as problematic. Some researchers even consider stemming as completely unnecessary. There is, however, considerable evidence that stemming and more linguistically motivated methods do have a positive impact on retrieval performance for languages such as Dutch, German, Italian, or Slovene, which are morphologically richer than English. Morphologic phenomena like compounds and changes of the stem are still not handled by conventional stemmers. As German, for example, makes extensive use of these morphologic processes (consider compounds like Bundesverfassungsgericht, and changes of the stem like in Häuser, the plural of Haus), the application of full morphologic analysis to the information retrieval task intuitively seems to be promising. This thesis sets out to determine the usefulness of morphologic analysis in information retrieval systems, particularly for the retrieval of German-language documents. An experimental retrieval system called IRF/1 was developed as a test bed. It is described in this thesis. IRF/1 is used to compare the retrieval effectiveness of different text processing methods for a test collection of about 300 magazine articles. The evaluated methods are: 3 1. stemming (as a baseline), 2. base form reduction using morphologic analysis, 3. same as (2) but compounds are split into the base forms of their constituents , and 4. same …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Method for Contact-Free Cardiac Synchronization Using the Pilot Tone Navigator

Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany, Erlangen Graduate School in Advanced Optical Technologies, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany, Magnetic Resonance, Product De何♫nition and Innovation, Siemens Healthcare GmbH, Erlangen, Germany, Magnetic Resonance, Research and Developme...

متن کامل

HIV Activates the Tyrosine Kinase Hck to Secrete ADAM Protease-Containing Extracellular Vesicles

a Department of Dermatology, Universitätsklinikum Erlangen, Friedrich-Alexander Universität Erlangen-Nürnberg, Hartmannstr. 14, 91054 Erlangen, Germany b Department of Virology, University of Helsinki, PO Box 21, Haartmaninkatu 3, 00014, Finland c Department of Internal Medicine V, Haematology and Oncology, Universitätsklinikum Erlangen, Friedrich-Alexander Universität Erlangen-Nürnberg, Hartma...

متن کامل

Laser Theory for Optomechanics: Limit Cycles in the Quantum Regime

Niels Lörch, Jiang Qian, Aashish Clerk, Florian Marquardt, and Klemens Hammerer Institut für Gravitationsphysik, Leibniz Universität Hannover and Max-Planck-Institut für Gravitationsphysik (Albert-Einstein-Institut), Callinstraße 38, 30167 Hannover, Germany Institut für Theoretische Physik, Leibniz Universität Hannover, Appelstraße 2, 30167 Hannover, Germany Arnold Sommerfeld Center for Theoret...

متن کامل

Novel Projection-based Unsupervised Respiratory Motion Feedback for Free-Breathing Whole-Heart Coronary MR Imaging

Novel Projection-based Unsupervised Respiratory Motion Feedback for Free-Breathing Whole-Heart Coronary MR Imaging Christoph Forman, Davide Piccini, Joachim Hornegger, and Michael O. Zenge Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany, Erlangen Graduate School in Advanced Optical Technologies (SAOT), Friedrich-Alexa...

متن کامل

Isotropic 3-D CINE Imaging with Sub-2mm Resolution in a Single Breath-Hold

Jens Wetzl, Michaela Schmidt, Michael O. Zenge, Felix Lugauer, Laszlo Lazar, Mariappan Nadar, Andreas Maier, Joachim Hornegger, and Christoph Forman Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany, Erlangen Graduate School in Advanced Optical Technologies (SAOT), Friedrich-Alexander-Universität Erlangen-Nürnberg, Erla...

متن کامل

Image Registration for the Alignment of Digitized Historical Documents

AmirAbbas Davari, Tobias Lindenberger, Armin Häberle, Vincent Christlein, Andreas Maier, Christian Riess 1. Pattern Recognition Lab, Computer Science Department, Friedrich-Alexander Universität Erlangen-Nürnberg, Erlangen, Germany [amir.davari, tobias.lintob.lindenberger, vincent.christlein, andreas.maier, christian.riess] @fau.de 2. Bibliotheca Hertziana Max-Planck-Institut für Kunstgeschichte...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000